Optimal Cross-Validation Split Ratio: Experimental Investigation

نویسنده

  • Cyril Goutte
چکیده

Cross-validation is a widespread method for assessing the generalisation ability of a model in order to tune a regularisation parameter or other hyper-parameters of a learning process. The use of cross-validation requires to set yet an additional parameter, the split ratio. Few texts have investigated theoretically the asymptotic setting of this ratio, and no consensus has emerged. In this contribution, we investigate the sensitivity and optimal setting of the split ratio on a particular model, a non-parametric kernel estimator with adaptive metric. 1 Cross-validation Most eecient learning procedures require the setting of an extra learning parameter , or \hyper-parameter". Neural networks typically use a regularisation parameter weighting a weight decay 1], or the extent of pruning 2]. Estimating the \optimal" hyper-parameter is the topic of active current research in the statistical learning community 3]. Let us consider a typical learning problem: modelling an input-output relationship based on some empirical data

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Optimal Data Split for Generalization Estimation and Model Selection

Modeling with flexible models, such as neural networks, requires careful control of the model complexity and generalization ability of the resulting model. Whereas general asymptotic estimators of generalization ability have been developed over recent years (e.g., [9]), it is widely acknowledged that in most modeling scenarios there isn't sufficient data available to reliably use these estimato...

متن کامل

Development of Flow within a Diffusing C-Duct –Experimental Investigation and Numerical Validation

Experimental investigation of flow development within a rectangular 90o curved diffusing C-duct of low aspect ratio and area ratio of 2 was carried out and the threedimensional computational results are then compared with the experimental results for numerical validation. All measurements were made in a turbulent flow regime (Re = 2.35x10 5 ), based on the duct inlet hydraulic diameter (dh = 0....

متن کامل

A toolkit for cross-validation: The R package cvTools

The idea of cross-validation is simple and easy to implement: split the data into several blocks, leave out one block for model estimation, and predict the values of the left-out block. Those predictions are then used to compute a certain prediction loss function. Even though the basic procedure is simple, some additional programming effort is necessary for more complex procedures such as repea...

متن کامل

Parallel Sampling of HDPs using Sub-Cluster Splits

We develop a sampling technique for Hierarchical Dirichlet process models. The parallel algorithm builds upon [1] by proposing large split and merge moves based on learned sub-clusters. The additional global split and merge moves drastically improve convergence in the experimental results. Furthermore, we discover that cross-validation techniques do not adequately determine convergence, and tha...

متن کامل

Determining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm

Several radial basis function based methods contain a free shape parameter which has  a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different  functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis  ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998